Journal of Chemical Information and Modeling — Latest Matching Preprints

1

Dynamic consensus pocket detection across molecular dynamics ensembles reveals persistent and transient druggable sites

Marigliani, G.; Petrizzelli, F.; Mangoni, M.; Bianco, S. D.; Orzella, I.; Guzzi, P. H.; Caputo, V.; Biagini, T.; Mazza, T.

2026-07-02 bioinformatics 10.64898/2026.06.27.734992 medRxiv

Top 0.1%

72.3%

Show abstract

The traditional 'one drug, one target' paradigm assumes that drugs interact with a single specific binding site. Modern pharmacology has proven this definition overly simplistic and, instead, recognizes that drugs operate within complex biological systems and often interact with multiple targets. In this context, proteins cannot be viewed as possessing a single functional binding site, but rather as dynamic entities capable of accommodating ligands at multiple regions, including transient and cryptic pockets. Here, we review and repurpose representative pocket detection tools across geometry-based, energy-based, and machine/deep learning approaches, originally designed to work on static conformations, to evaluate their agreement on molecular dynamics-derived conformational ensembles. Using GLUT1 protein as a dynamic transporter model and Aldose reductase as a cryptic-pocket reference system, we combine inter-tool concordance, HDBSCAN-based spatial clustering, volumetric IoU analysis, and temporal persistence scoring. Our results show that different algorithmic classes capture complementary aspects of pocket dynamics, with energy-based methods showing stronger sensitivity to transient cryptic regions and geometry-based approaches depending more strongly on pre-formed cavities. This work proposes a consensus-oriented framework for identifying conserved and transient druggable pockets in dynamic protein systems.

2

TwinSAR: An Adaptive Kernel-based Algorithm with logit-transformed Z-score Filtering for Chemical Twin Detection in Large-scale Virtual Screening

Haris Kulosmanovic, H.; Uguz, C.; DURDAGI, S.

2026-05-15 bioinformatics 10.64898/2026.05.12.724687 medRxiv

Top 0.1%

70.0%

Show abstract

Molecular similarity searching is a workhorse of cheminformatics, but the dominant Tanimoto/topological-fingerprint paradigm has well-known blind spots. It is highly sensitive to molecular size, suffers from steep activity cliffs, and frequently fails to retrieve scaffold-hopping bioisosteres. A complementary descriptor that has received comparatively little attention is global elemental composition. Despite the conceptual simplicity of comparing molecules by their elemental ratios, no widely deployed method exists for the statistically rigorous identification of "chemical twins" defined by stoichiometric proximity. We address this gap with TwinSAR (Stoichiometric Analysis and Retrieval), an adaptive kernel-based algorithm that combines three methodological innovations: (i) binary fingerprint blocking that partitions molecule by element-presence patterns and bounds the cost of all-pairs comparison from O(NM) to O({sum}nimi) enabling million/billion-scale searches; (ii) a per-block adaptive radial basis function (RBF) kernel whose precision parameter is calibrated independently for each fingerprint block via the median heuristic, providing fair similarity comparison across chemical sub-spaces of vastly different density; and (iii) a logit-transformed Z-score filter that maps bounded RBF scores onto an unbounded scale, allowing high-similarity pairs to be prioritized relative to the empirical score distribution of their own fingerprint block. TwinSAR is offered in two operating modes: (i) a deterministic BULK mode for exact reproducibility; and (ii) a stochastic FAST mode that achieved a 3.29x wall-clock speed-up in the present benchmark while preserving the similar unique-query and unique-target coverage. Statistical validation showed that detected twin pairs are 12.7x more similar in absolute ratio space than block-matched random pairs (p < 0.001), while a column-permutation negative control returned a median of zero spurious twins across three independent permutations. A controlled benchmark further established that an 8-element representation (single-element heavy-atom ratios) is sensitivity-equivalent to a comprehensive 254-element representation while running 3.55x faster. As a case study, TwinSAR was deployed in an end-to-end virtual screening pipeline against the BCL-2 target protein, where it reduced a 327,071-compound commercial library to a 390 focused candidate panel. The chemical interpretability of the retrieved twins is illustrated by their structural diversity around conserved heavy-atom skeletons. TwinSAR therefore provides a fast, conformation-free, and statistically principled prefilter that is fully orthogonal to topological fingerprints.

3

Structure-guided compound prioritization strategy for virtual screening identifies putative binders for the nuclear receptor LRH-1

Chang-Gonzalez, A. C.; Campbell, A. N.; Bell, E. W.; Blind, R.; Meiler, J.

2026-06-07 bioinformatics 10.64898/2026.06.04.730240 medRxiv

Top 0.1%

66.1%

Show abstract

Compound ranking in structure-based virtual screening notoriously yields highly ranked false positive binders due to variable poses or biases in scoring terms. We developed a compound prioritization strategy that utilizes sampled docked poses from contrasting docking approaches (targeted physics-based docking and blind docking with a generative model) against multiple models of the target protein to train a multi-layer perceptron (MLP). The model predicts binders at the orthosteric ligand-binding pocket of the nuclear receptor LRH-1 (NR5A2). Our approach circumvents the reliance on a single docked pose for scoring compounds or individual scoring metrics for compound ranking. In a separate benchmarking set, we observed that the MLP identifies known binders that are chemically dissimilar from the compounds in the training set and is sensitive to single scaffold modifications, making it a potential tool for lead optimization. We applied our strategy to a prospective virtual screening campaign, which resulted in the discovery of four putative LRH-1 binders. We found that a combination of scoring and prediction metrics enriches for the hit compounds across library sizes. In all, this implementation presents a method to leverage structural and experimental data to aid virtual screening for a challenging protein target.

4

StructureSAFE: A structure-aware chemical language model for unified hit identification and lead optimization

Yang, B.; Xu, K.; Xiang, C.; Lee, B.; Xu, Y.; Li, T.; Shi, Y.; Sinitskiy, A.; Li, J.

2026-07-02 bioinformatics 10.64898/2026.06.28.735128 medRxiv

Top 0.1%

65.0%

Show abstract

Structure-based generative models (SBGMs) hold great promises for accelerating drug discovery by enabling target-aware molecular design. However, existing approaches face fundamental challenges: three-dimensional graph-based models can explicitly incorporate protein structural information but often generate chemically implausible molecules due to limited training data, while chemical language models (CLMs) produce chemically plausible molecules but struggle to effectively leverage three-dimensional structural information for structure-conditioned generation and hard to incorporate lead optimization functionality due to the nature of SMILES string. Here, we present StructureSAFE, a structure-aware chemical language model that resolves this trade-off by integrating protein structural and evolutionary encoders with the SAFE molecular representation via pretraining and finetuning training scheme, enabling both de novo hit identification and a comprehensive suite of lead optimization subtasks within a unified framework. Comprehensive benchmarking on the MolGenBench dataset demonstrates that StructureSAFE achieves state-of-the-art (SOTA) performance across multiple metrics, with particularly pronounced improvements in chemical plausibility relative to graph-based models lacking pretraining. Evaluation on a rigorously constructed held-out test set further confirms its ability to generate drug-like, synthetically accessible molecules with competitive predicted binding affinities for previously unseen targets on both hit identification and lead optimization setting. In silico case studies across four therapeutically relevant targets validate its capacity to generate chemically plausible molecules that recapitulate key binding interactions of known high-affinity ligands while proposing novel interactions for potential better affinity and exploring previously unknown regions of chemical space. Taking together, StructureSAFE represents a versatile and practical tool to provide high-quality candidate molecules for augmenting medicinal chemistry workflows in both hit identification and lead optimization campaigns.

5

MolCodon: A Codon-Based Molecular Language for InterpretableStructural Representation and Similarity Search

Sayyah, E.; Kurul, E.; Tunc, H.; DURDAGI, S.

2026-05-21 bioinformatics 10.64898/2026.05.20.726468 medRxiv

Top 0.1%

62.3%

Show abstract

Molecular representation determines which aspects of chemical structure can be learned, compared, and interpreted in computational drug discovery. Existing encodings typically emphasize either compact string description, as in SMILES and SELFIES, or efficient similarity search, as in circular fingerprints, but they may not simultaneously provide deterministic sequence structure, graph-level interpretability, pharmacophore annotation, and high-fidelity molecular reconstruction. Here, we introduce MolCodon, a codon-based molecular language that represents small molecules as deterministic sequences of fixed-width three-character tokens over a five-symbol alphabet, C, N, O, S, and X. Inspired by the triplet organization of the genetic code, MolCodon assigns chemically defined codon families to atoms, bonds, ring and branch topology, fused-ring references, pharmacophore features, bond mobility, charge, and stereochemistry. A deterministic graph traversal with ring-contiguity preservation produces sequences in which chemically meaningful substructures remain locally organized and traceable to the underlying molecular graph. Across around 2,9 million molecules from six commercial screening libraries, MolCodon achieved 98.93% InChIKey-level round-trip fidelity, supporting its use as a high-fidelity sequence representation for drug-like chemistry. MolCodon-derived sparse sequence and trace features further outperformed SELFIES and Group SELFIES across ten QSAR tasks and exceeded classical fingerprint baselines in six out of ten tasks. As an application of the representation, MolCodon BLAST similarity engine decomposes molecular similarity into ring topology, branch context, attachment architecture, and pharmacophore correspondence, enabling interpretable scaffold-hopping searches. In a PARP1 virtual screening study, MolCodon retrieved scaffold-diverse candidates to a known PARP-1 inhibitor Olaparib. Together, these results establish MolCodon as a new molecular representation paradigm that transforms chemical graphs into high-fidelity, interpretable, and alignment-compatible codon sequences, opening a direct path for bioinformatics-inspired analysis of small-molecule chemical space. The MolCodon encoder, decoder, and BLAST similarity engine are freely available as open-source software at https://github.com/DurdagiLab/MolCodon

6

Linobectide: a mathematical-chemistry modified black-hole algorithmic framework for ORF1p inhibitor design

GRIGORIADIS, I.

2026-05-08 biophysics 10.64898/2026.05.06.723314 medRxiv

Top 0.1%

61.3%

Show abstract

Computer-aided drug design for conditional biomolecular interfaces requires evaluation across more than one receptor structure, docking pose, or scalar score. LINE-1 ORF1p is treated here as a state-family interface target whose relevant behavior is distributed across receptor microstates, assembly-compatible contact neighborhoods, ligand conformers, and perturbation snapshots. This article presents Linobectide as a mathematical-chemistry CADD workflow centered on a modified black-hole algorithm (MBHA) for persistence-weighted prioritization of putative ORF1p inhibitor candidates. Each molecule is represented as a dossier containing standardized descriptors, docking annotations, interaction-class persistence vectors, finite-action stability traces, graph-localization summaries, SPECTRAL-SAR applicability-domain records, and rank-shift diagnostics. The revised analysis emphasizes numerical reporting endpoints: fixed run parameters, baseline comparators, ablation metrics, rank stability, regeneration fractions, protected-elite fractions, and reproducibility indices. Docking is used as an annotation layer rather than as a stand-alone proof of inhibition. The framework is therefore reported as a transparent computational prioritization protocol that generates testable hypotheses for future biochemical and cellular validation, not as experimental proof of ORF1p inhibition or therapeutic activity. Author summaryDrug-design workflows can become over-dependent on the best docking pose even when an interface target remains functional through alternative contact corridors. Linobectide addresses this issue by ranking candidates only after docking annotations are aggregated across receptor-state and perturbation conditions. The MBHA search promotes a candidate when interaction persistence, finite-action stability, graph localization, SPECTRAL-SAR coherence, applicability-domain support, and reproducibility checks are concordant. The revision removes unsupported claims of performance advantage and replaces them with benchmarkable endpoints that can be compared with docking-only, consensus-docking, and ablated MBHA baselines. The SI Appendix is retained as a figure atlas for state-family construction, graph-localization diagnostics, docking provenance, consensus geometry, and comparative triage.

7

Advancing in silico drug design with Bayesian refinement of AlphaFold models

Sen, S.; Hoff, S. E.; Morozova, T. I.; Schnapka, V.; Bonomi, M.

2026-05-06 bioinformatics 10.1101/2025.06.25.661454 medRxiv

Top 0.1%

61.2%

Show abstract

Virtual screening has become an indispensable tool in modern structure-based drug discovery, enabling the identification of candidate molecules by computationally evaluating their potential to bind target proteins. The accuracy of such screenings critically depends on the quality of the target structures employed. Recent advances in protein structure prediction, particularly AlphaFold2, have revolutionized this field with unprecedented accuracy. However, AlphaFold2 models often exhibit limitations in local structural details, especially within binding pockets, which limit their utility for small molecule docking. In contrast, molecular dynamics simulations with accurate atomistic force fields can refine protein structures, but lack the ability to leverage the structural information provided by deep learning approaches. Here, we introduce bAIes, an integrative method that bridges this gap by combining physics-based force fields with data-driven predictions through Bayesian inference. Crucially, bAIes demonstrates a superior ability to discriminate between binders and non-binders in virtual screening campaigns, outperforming both AlphaFold2 and molecular dynamics-refined models. By enhancing the usability of AlphaFold2 models without requiring extensive experimental or computational resources, bAIes offers a convenient solution to a longstanding challenge in structure-based drug design, potentially accelerating the early phases of drug discovery.

8

Comparative Analysis of Relative Ligand Binding Free Energy Simulation Methods: Amber-TI, GROMACS-NETI, OpenMM-FEP, and BLaDE-MSLD

Lee, H.; Kim, I.; Kim, S.; Bae, M.; Jeong, B.; Kim, S.; Jo, S.; Lee, J.; Im, W.

2026-04-24 biophysics 10.64898/2026.04.22.720125 medRxiv

Top 0.1%

60.2%

Show abstract

Structure-based drug design has become increasingly important in the pharmaceutical industry for accelerating the discovery of effective drug candidates. In particular, ligand binding free energy serves as a critical metric for predicting drug efficacy during the key stages of hit discovery and lead optimization. Continuous progresses have been made in the prediction of ligand binding free energies, but direct comparisons of different methods using the same force field remain challenging due to their unique implementations into different simulation engines. In this study, we present a direct comparison of four popular methodologies (Amber-TI, GROMACS-NETI, OpenMM-FEP, and BLaDE-MSLD) for calculating relative binding free energies ({Delta}{Delta}Gbind) with the same Amber protein and ligand force fields using MolCube Alchemical Free Energy Simulator (MolCube-AFES), which provides an input generation workflow to support {Delta}{Delta}Gbind calculations of all four methods. We used 80 alchemical transformations (among the JACS benchmark set by Wang et al.) and two additional applications to compare the predicted {Delta}{Delta}Gbind from the four methods against experimental measurements. All four methods reproduced experimentally observed trends with most transformations within {+/-}2 kcal/mol from experiments and show broadly comparable accuracy with no statistically significant performance differences across the benchmark dataset. These results demonstrate that MolCube-AFES enables controlled, cross platform benchmarking and show that all four different alchemical free energy methods deliver statistically equivalent accuracy, with method selection guided by workflow requirements such as throughput, portability, and perturbation network design rather than expected differences in performances.

9

Multi-level, multi-body atomic interaction graphs for machine learning-based prediction of protein-ligand binding energies

Le, T. T. H.; Nguyen, B. T.; Vo, H.; Nguyen, N. H.; Nguyen, D. D.

2026-06-07 bioinformatics 10.64898/2026.06.05.730001 medRxiv

Top 0.1%

59.7%

Show abstract

Accurate prediction of binding affinity is crucial for rational drug design and discovery. Traditional computational methods often rely on complex scoring functions that incorporate a multitude of physical and chemical descriptors, leading to high computational demands and sometimes limited generalizability. In this work, we propose a novel scoring function that models multi-level, multi-body atomic interactions using graph-based representations. Our method constructs comprehensive interaction graphs that incorporate both pairwise and triplet-wise atomic features that help capture cooperative spatial patterns essential for binding affinity prediction. By employing a feature fusion strategy, GMI-Score maintains model simplicity while enhancing accuracy. Extensive evaluation across multiple datasets, such as PDBbind v2013, PDBbind v2016, PDBbind v2020, CSAR-NRC-HiQ, and PDBbind-Redocked, demonstrates that our model consistently outperforms state-of-the-art scoring functions, achieving Pearson correlation coefficients up to 0.877. Furthermore, it retains strong predictive power under strict data leakage controls and realistic docking conditions to high-light its robustness and generalizability. Scientific ContributionIn this study, we present a scoring methodology that systematically captures higher-order atomic interactions within a unified graph framework, making a conceptual shift in cheminformatics scoring functions. Its consistent outperformances of existing methods and strong validity under redocked and withheld atascenarios demonstrate its utility for broad-scale molecular modeling applications and open heminformaticsworkflows.

10

Attracting Cavities 3.0: Faster and More Versatile Molecular Docking for the SwissDock Webserver

Roehrig, U. F.; Mathieu-Bugnon, M.; Zoete, V.

2026-04-23 bioinformatics 10.64898/2026.04.21.719847 medRxiv

Top 0.1%

56.3%

Show abstract

MotivationMolecular docking is a pillar of structure-based drug design and shows advantages in structure prediction of small-molecule ligand-protein complexes over co-folding methods for novel ligands and novel binding pockets. Here, we describe substantial improvements of our physics-based docking algorithm Attracting Cavities, which is widely used through the SwissDock webserver. ResultsAC 3.0 includes enhanced sampling features, new functionalities, and technical improvements. These lead to better sampling at lower execution times and higher versatility. Comparison with AutoDock Vina demonstrates better docking results on multiple test sets. AvailabilityAC 3.0 will be made available free of charge through the SwissDock webserver (www.swissdock.ch).

11

ConfDock: Atom-specific Uncertainty Quantification for Molecular Docking via Conformal Prediction

Hao, H.; Elhendawy, N.; Wang, Y.; Lu, C.

2026-07-01 biochemistry 10.64898/2026.06.29.735353 medRxiv

Top 0.1%

56.0%

Show abstract

Molecular docking is widely used in structure-based drug discovery, yet most approaches provide point estimates without rigorous uncertainty quantification. This limitation makes it difficult to assess when a predicted pose should be trusted, especially when docking methods are applied to diverse protein-ligand systems. We present ConfDock, a conformal prediction (CP) framework for constructing atom-specific prediction intervals for ligand docking poses. ConfDock combines graph neural network (GNN) based quantile estimation with split conformal calibration, producing intervals that adapt to local protein-ligand environments while retaining distribution-free finite-sample coverage guarantees. We evaluate ConfDock on 238 protein-ligand complexes across four docking methods representing distinct computational paradigms. The proposed approach yields substantially narrower prediction intervals compared to standard split CP (57.2% average reduction in mean interval width, up to 74.5%) while maintaining target coverage across all evaluated settings. Ablation analysis indicates that the GNN captures the dominant structure-dependent variability in uncertainty, whereas the conformal calibration step provides a bounded adjustment to ensure coverage guarantees. These results demonstrate that combining learned, structure-aware quantile estimation with conformal calibration enables rigorous uncertainty quantification for molecular docking at atom-level resolution.

12

Conformational Preference Classification of Integrin-Binding Ligands Using Free Energy Perturbation

Vögele, M.; Shahoei, R.; Petridis, L.; Li, J.; Lin, F.-Y.; Wang, L.; Springer, T. A.; Vendome, J.

2026-04-30 biophysics 10.64898/2026.04.27.721214 medRxiv

Top 0.1%

55.8%

Show abstract

Integrins are crucial cell adhesion receptors and attractive therapeutic targets, but developing oral small-molecule inhibitors has been challenging, at least in part due to inadvertent partial agonism caused by stabilization of the integrins open, high-affinity state. To address this challenge, we present a computational approach using Absolute Binding Free Energy Perturbation (AB-FEP) calculations to predict whether a ligand will stabilize the open or closed integrin states, leveraging the difference between the ligands binding free energy to the respective end states. Despite challenges posed by Ca and Mg ions, metal-coordinating residues in the binding pocket, and the subtlety of structural differences between states, AB-FEP achieves excellent classification performance on a set of known opening and closing ligands, significantly outperforming docking scores and MM-GBSA results. We also show a good correlation between AB-FEP binding free energy differences and experimental values. Furthermore, AB-FEP provides insights into intermediate integrin states and analysis of simulation trajectories confirmed the formation of a water-mediated hydrogen bond network with an ion in the binding pocket to be characteristic for closing ligands. This work demonstrates AB-FEP as a robust method for classifying integrin ligands and understanding their functional mechanisms, offering valuable guidance for designing safe and conformationally selective integrin therapeutics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=109 SRC="FIGDIR/small/721214v1_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@7452c2org.highwire.dtl.DTLVardef@e57d5corg.highwire.dtl.DTLVardef@8959d8org.highwire.dtl.DTLVardef@169742e_HPS_FORMAT_FIGEXP M_FIG C_FIG

13

Pharmacological proximities in the GPCR family discovered using contact-informed amino-acid and binding pocket similarities

So, S. S.; Ngo, T.; Ilatovskiy, A. V.; Finch, A. M.; Riek, R. P.; Abagyan, R.; Smith, N. J.; Kufareva, I.

2026-05-06 bioinformatics 10.64898/2026.05.02.720972 medRxiv

Top 0.1%

53.9%

Show abstract

Understanding protein proximities in the theoretical ligand space is essential for developing therapeutics with desirable polypharmacology, predicting off-targets, and discovering surrogate ligands for poorly characterized proteins. This is especially important for G protein-coupled receptors (GPCRs) - a major class of drug targets, many of which still lack known ligands. Circumventing this limitation, we present GPCR-CoINPocket v2, a contact-informed metric for detecting GPCR pharmacological similarities from amino-acid sequences alone. We first establish a "gold standard" of pharmacological relatedness using ChEMBL-derived ligand sets. We then replace traditional evolutionary amino acid similarity matrices with a chemically-informed matrix derived from protein:ligand interaction patterns across 3,306 structures, significantly improving early detection of shared pharmacology between distantly homologous receptors. An additional unconstrained, contact-informed matrix further enhances predictive performance. Pilot application of the method revealed previously unrecognized similarities between the {beta}2 adrenoceptor and three Class A peptide GPCRs, which we confirmed experimentally by demonstrating the binding of select ligands of these receptors to the {beta}2. Dimensionality reduction of similarity scores recapitulates known receptor relationships and predicts neighbors of orphan GPCRs later confirmed experimentally. Overall, GPCR-CoINPocket v2 provides a powerful sequence-based framework to prioritize ligand space, predict polypharmacology, and accelerate GPCR drug discovery and deorphanization.

14

BoltzMol-1: Towards Reliable Virtual Screening for Fast and Cost-Effective Hit Discovery

Getz, N.; Smith, G.; Colgan, A.; Fan, V.; Cavalleri, L.; Capponi, F.; Wohlwend, J.; Gitter, A.; Kritzer, J.; Maiorano, M.; Wlodarchak, N.; Corso, G.; Passaro, S.

2026-07-06 biochemistry 10.64898/2026.07.04.736485 medRxiv

Top 0.1%

53.0%

Show abstract

We present BoltzMol-1, a small-molecule hit discovery pipeline, centered on an optimized version of Boltz-2, explicitly adapted for prospective discovery. Reliable hit discovery that generalizes across target classes (rather than only the well-characterized families that dominate existing ligand data) would broaden the range of biology accessible to small-molecule intervention and reduce reliance on resource-intensive high-throughput screening. Towards this goal, the system prioritizes compounds for rapid experimental validation by coupling model-driven ranking with streamlined procurement from commercial catalogs. To improve developability at the point of selection, we introduce a suite of ADMET models for kinetic solubility (logS), lipophilicity (logD), and Caco-2 permeability. These models act as an early triage layer, systematically filtering out compounds with unfavorable physicochemical and absorption properties prior to synthesis or purchase. Across a panel of ten targets (most with no representation in the underlying affinity training data) we observe strong prospective performance on challenging systems. Functional actives or binders were identified for 6 of 10 targets, despite modest experimental budgets of 28-96 compounds per target. These results include successes on receptors and enzymes traditionally considered difficult for structure- or ligand-based approaches. Collectively, this work establishes a practical framework for low-throughput, cost constrained discovery campaigns capable of delivering chemically tractable binders with favorable property profiles.

15

ADMETron: An AI-driven SaaS platform for comprehensive ADMET prediction and compound prioritisation

Nair, D. N.; Yadav, R. S.; Jondhale, P. M.; Didhate, S.; Gunjal, G.; Ranjit, A.; Patil, P.; Dawande, A.; Shisode, A.; Bhagwat, A.; Scheele, J.; Zharavin, V.; Arora, S.

2026-06-13 bioinformatics 10.64898/2026.06.13.732026 medRxiv

Top 0.1%

51.8%

Show abstract

ONTOSIGHT(R) ADMETron is a high-performance SaaS based AI platform designed for the rapid profiling and visualization of ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties. The platform integrates a highly interactive web interface with a robust predictive engine, enabling the batch processing of compounds for high-throughput virtual screening. The core engine employs an ensemble model that combines recurrent neural network (RNN)-derived embeddings from SMILES strings with physicochemical descriptors, which are fed into gradient boosting machines (GBMs). This architecture provides accurate predictions across 34 distinct ADMET endpoints, encompassing critical categories such as physicochemical properties, absorption, CYP450 inhibition, hERG toxicity, and mutagenicity. The platforms superior performance is quantitatively validated by its top-tier ranking on the Therapeutics Data Commons (TDC) ADMET Benchmark Group, demonstrating robustness and generalizability with notable results including 2nd place for Ames mutagenicity (AUROC 0.870) and 2nd place for LD50 (MAE 0.573). In addition to its predictive capabilities, ADMETron introduces a novel SAR analysis framework that enables real-time comparison of multiple compounds and approved drugs through an interactive radar graph visualization. Comparative evaluation against widely used online ADMET platforms demonstrated broader endpoint coverage, including pharmacokinetic, physicochemical, and medicinal chemistry assessments within a unified environment. The combination of benchmark-validated predictive performance, comprehensive ADMET profiling, and advanced visualization tools positions ADMETron as a next-generation platform for virtual screening, lead optimization, and data-driven decision-making in modern drug discovery (https://admetron.partex.ai/).

16

Does the sequence of a disordered protein encode small molecule binding paths?

Louet, A. A. B.; Hummer, G.; Vendruscolo, M.

2026-05-23 biophysics 10.64898/2026.05.20.726646 medRxiv

Top 0.1%

51.8%

Show abstract

Ligand binding to intrinsically disordered proteins resists description in terms of conventional binding pockets, yet it can be analysed as a dynamic process in which ligands move across transient surface interaction sites. Here we characterise a pathway-based representation in which ligand binding is described as a sequence of transitions between residue-defined microstates, enabling ligand-specific effects to be distinguished from intrinsic properties of the peptide conformational ensemble. Using all-atom molecular dynamics simulations of A{beta}42 and the C-terminal region of -synuclein in complex with chemically diverse small molecules, we construct transition matrices that encode ligand movement across the peptide surface and use Markov state models to identify dominant binding pathways and relative binding propensities. Pairwise enrichment-factor and AUC analyses reveal strong conservation of the highest-ranked pathways across chemically diverse ligands, with enrichment factors of 15-45 for the top-ranked states and AUC values typically [≥]0.75, well above random expectation. These dominant pathways are also preserved across changes in pH and temperature, whereas a urea control, included as a non-specific binder, shows reduced enrichment, indicating that ligands primarily modulate pathway weights rather than define the underlying network topology. Ensemble docking across chemically diverse libraries further supports the presence of recurrent ligand-accessible hotspots within the peptide conformational ensemble. Building on this framework, we apply a prospective screening pipeline to A{beta}42, combining MSM-derived hotspots with sequence-based Ligand-Transformer scoring and Gnina docking across 1.66 million compounds, to nominate 19 candidates for prospective experimental evaluation. Together, these results indicate that disordered protein sequences give rise to conformational ensembles that exhibit characteristic binding pathways for small molecules.

17

F.A.D.E. (Fully Agentic Drug Engine): A Conversational AI Platform for Drug Discovery

Kantorow, J.; Mani, N.; Mohanraj, N. R.; Zong, X.

2026-06-25 biophysics 10.64898/2026.06.20.733481 medRxiv

Top 0.1%

50.2%

Show abstract

Drug discovery remains one of the costliest and most time-intensive endeavors in the pharmaceutical pipeline, with average development costs exceeding $2.3 billion per drug, timelines spanning more than a decade, and attrition rates above 90% in clinical trials. While computational methods have expanded the searchable chemical space, current pipelines remain fragmented and largely inaccessible to researchers without deep interdisciplinary expertise. Here we present F.A.D.E. (Fully Agentic Drug Engine), a multi-agent, open-source platform that converts natural language queries into potential drug candidates, substantially lowering the expertise barrier to advanced computational drug discovery. F.A.D.E. employs a three-branch hierarchical architecture that adapts to the level of available structural data for any protein target, integrating structure prediction, binding pocket detection, equivariant diffusion-based de novo ligand generation, and binding affinity estimation into a single automated pipeline. We validate F.A.D.E. on two structurally distinct targets: the epidermal growth factor receptor kinase domain (EGFR), a well-established oncology target, and cellular retinol-binding protein 1 (CRBP1), a lipid-binding protein involved in retinoid metabolism. For EGFR, our generated candidates achieved QED scores of 0.85 compared to 0.46 for the co-crystallised reference ligand, demonstrating marked improvement in predicted drug-likeness. Results across both targets confirm that F.A.D.E. can reliably generate chemically tractable, drug-like hit compounds across diverse protein classes from simple natural language input.

18

The genetically-encoded amino acids distribute non-randomly within a functionally-relevant chemical space

Brown, S. M.; Hervey, J.; Dean, S. N.; Vora, G. J.

2026-05-07 synthetic biology 10.64898/2026.05.06.723277 medRxiv

Top 0.1%

49.3%

Show abstract

The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets. HighlightsO_LILifes amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations. C_LIO_LIAmino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals C_LIO_LIFindings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets. C_LI

19

Parameterization of the PA endonuclease bimetallic center reveals the dynamics of clinically relevant mutations

Wang, L.; Li, P.; Sztain, T.

2026-06-16 biophysics 10.64898/2026.06.12.731895 medRxiv

Top 0.1%

46.2%

Show abstract

Influenza A virus continues to impose a major global health and economic burden through seasonal epidemics and occasional pandemics, highlighting the critical need for continued antiviral development. As the latest addition to anti-influenza therapy, baloxavir marboxil (BXM) targets the highly conserved PA N-terminal endonuclease domain (PAN), blocking the cap-snatching process essential for viral transcription initiation. However, the rapid emergence of resistance mutations significantly reduces BXM susceptibility and compromises its clinical efficacy. Understanding the dynamics underlying resistance through computational modeling has been hindered by the complex electronic properties of the bimetallic catalytic center within the PAN active site, posing a challenge for accurate parameterization. Therefore, in this study, we systematically benchmarked metal-parameterization strategies for molecular dynamics (MD) simulations, including non-bonded, bonded, and hybrid models, using wild-type PAN in both apo and drug-bound states. Identification of reliable parameterization schemes enabled MD simulations of five clinically relevant mutants, I38T/F/M, A36V, and E23K, revealing how each reshapes the conformational landscape to modulate drug binding modes. Together, our results provide a path toward modeling complex sites in metalloenzymes and a mechanistic foundation for vulnerabilities in PAN to guide structure-based optimization of next-generation inhibitors. TOC O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=122 SRC="FIGDIR/small/731895v1_ufig1.gif" ALT="Figure 1"> View larger version (40K): org.highwire.dtl.DTLVardef@1a9474borg.highwire.dtl.DTLVardef@87b0a7org.highwire.dtl.DTLVardef@5edb9borg.highwire.dtl.DTLVardef@1a3be21_HPS_FORMAT_FIGEXP M_FIG C_FIG

20

On the applicability domain of HADDOCK3 for protein-aptamer docking: documented failure modes from a 5x7 cross-target screening matrix and a 1676 aa receptor case study (P01031)

Dohi, E.

2026-05-12 bioinformatics 10.64898/2026.05.11.724398 medRxiv

Top 0.1%

45.7%

Show abstract

We screened a 5 receptor x 7 aptamer = 35-cell cross-target matrix with HADDOCK3 [1] under blind ambiguous-interaction-restraint (AIR) protocols on AlphaFold-modelled receptors. The screen surfaced 12 operationally distinct failure modes (collapsing to [~]8 conceptual classes; [§]3.1). The K_D-calibration subset is n = 4 cells with literature K_D records under matched assay conditions; the broader cohort includes [≥] 6 biological cognate or intended-cognate cells. The principal case study is P01031 (complement C5, 1676 aa, [≥] 12 structural domains): all 7 panel members produced positive HADDOCK3 top-1 scores under a scale-adaptive AIR. Score-term decomposition locates the anomaly in the AIR term (+217 to +268 to top-1 score). With AIR zeroed, scores fall to -131 to -74 -- the small-receptor regime. Boltz-2 cofolding chain-pair ipTM (cpi_AB) is an independent channel: P01031 shows the lowest median cpi_AB (0.211; 0/7 above the 0.5 confident-interface threshold). To our knowledge, this is the first reported case study of a 1676 aa multi-domain receptor exhibiting this signature under blind scale-adaptive AIR -- an n = 1 mechanistic case, not a statistical generalisation. We adapt the QSAR applicability domain concept [14-16] to in silico aptamer screening. [§]3.7 reports an empirical Mode 1 mitigation (pLDDT-aware AIR prefilter; cohort Jaccard recovery [~]10x).